Determining abundance of deer species in Victoria using camera trap data
The density of deer at a given site can be estimated using camera-trap distance sampling (CTDS). CTDS is a modified form of distance-sampling that allows us to infer the probability a given individual will be detected within the survey area (area in front of the camera). This detection probability is a function of the distance of the individual from the camera, whereby individuals entering the camera field of view further from the camera have a lower detection probability up to a given truncation distance from the camera where detection probability is near-zero.
An underlying assumption about CTDS is that the probability a deer will be available for detection at any given point location within the camera field of view is equal. Under this assumption, for a point transect, we take into account the total area for each distance-bin area, which increases at further distance bins. However, in this study we implement a novel method that considers group size of the detected species in the availability calculations. For larger groups, CTDS should account for the availability of the closest individual rather than the availability of all individuals. When group size increases it is more likely the the closest individual from the group is closer to the camera. If we assume that the triggering of the camera is dependent upon the closest individual than we must adjust our estimated availability to account for variable group sizes. If we do not adjust for group size and only use the distance to the closest individual for our DS models then we will likely underestimate the detection rate as distances will be smaller than reality. Alternatively, if we record distances to multiple individuals in the same photo and take an average or model them independently we would likely overestimate detection probability because individuals at further distances are recorded because a closer individual has triggered the camera trap. For more information on how group size is likely to effect CTDS abundance estimates a simulation study has been written here: https://justincally.github.io/blog/posts/2022-11-10-ctds-for-groups/
In this study we investigate two possible detection functions that may explain how detection rates ‘fall off’ over the distances from the camera. Firstly a half-normal detection function given by:
\[p = exp(-y^2/2\sigma^2) \] Hereby \(p\) is the probability of detection, \(y\) is the distance from the camera (midpoint of the detection bin), and \(\sigma\) is the shape parameter. Alternatively, we compare the fit of this detection function to a hazard-rate function, where detection probability is given by:
\[p = 1 - exp(-(y/\sigma)^{-\theta})\] In these hazard-rate models, an additional parameter (\(\theta\)) is estimated, which is scale parameter. Given that detection rate may not necessarily be uniform across sites we model the magnitude of the shape parameter across sites (i) as:
\[log(\sigma_i) = \alpha + \beta_{det} \times HU_i\] Where \(HU_i\) is the amount of herbaceous understorey measured surrounding the camera. \(\beta_{det}\) is the coefficient for the effect of herbaceous understorey on detection and \(\alpha\) is the intercept.
Using these estimates of detection probability in different bins we can then estimate the average detection probability of an individual at a given camera station. This can then be used to account for imperfect detection when fitting observed counts from the camera trap to parameters estimating the true abundance at a site.
The density at a site is modelled as a function of the observed counts, relative frequency of group sizes, distance-sampling detection probability, survey effort (area in front of camera x time/snapshot moments the camera is deployed for), proportion of time within a 24-hour cycle that deer were active (\(activity\)). This parameter was estimated using the activity R package and included in the model using an informative beta-distribution prior. For a given species the number of detections of groups (1 or more deer) during the deployment is estimated with:
\[C_{n,j} = poisson(exp(log(\lambda_{n}) + log(p_{n,j}) + log(activity) + log(\epsilon gs_{n,j}) + log(r)) \times survey\ area_n)\]
Where \(C_{n,j}\) is the observed count of group size (\(j\)) at a site (\(n\)). \(\lambda_{n}\) is the true density at a site. In this model \(X_n\) are the values of the fixed-effect covariates, \(\beta\) are the fixed effect coefficients and \(\epsilon bioregion_n\) is the bioregion random effect drawn from a standard normal distribution:
\[\lambda_{n} = X_n \times \beta + \epsilon bioregion_n\]
The \(\epsilon gs_{n,j}\) is the proportional variation in group size ar a site, where \(\sum \epsilon gs_{n,j \in [1,max(group\ size)]} = 1\). Assuming that the rate of group sizes can vary between sites, we model the variation in group size between sites with a group-size level intercept (\(\zeta_j\)) and site-group-size level random effect (\(\epsilon site_{n,j}\)):
\[\epsilon psi_{nj} = exp(\zeta_j + \epsilon site_{n,j}) \] The proportional group size at a given site and for a given group size is thus given by:
\[\epsilon gs_{nj} = \frac{\epsilon psi_{nj}}{sum(\epsilon psi_{n})}\]
Our surveys involved the deployment of cameras at a site for usually 6-8 weeks as well as 150m transect searches that were walked up and back by a single individual for two transects, and one transect which was walked once by two observers. This gives us an approximate survey area at each site of 150m x 2 x 3 = 900m. Deer sign was recorded on these transects with pellets, footprints, rubbings, and wallows recorded as either present or absent.
This data provides supplemental presence-absence data that can be integrated into our model to help inform likely densities at sites where cameras did not record observations but one ore more deer signs were detected along the transects.
The foundation for this integration is a Royle-Nichols model (Royle and Nichols 2003) that models the frequency of detection to relative abundance measures. Given that we also collect absolute abundance estimates via the camera trap data, relative abundance measures from this output can be transformed to absolute abundance and thus provide an avenue to integrate camera and transect-based data into a single model estimating abundance.
Load in relevant packages for analysis, additionally, connect to the database. Camera trap and site data is stored on the database.
library(cmdstanr)
library(dplyr)
library(sf)
library(bayesplot)
library(loo)
library(gt)
library(terra)
library(caret)
library(tidyterra)
library(tidyr)
library(VicmapR)
library(kableExtra)
check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)
register_knitr_engine(override = FALSE)
options(mc.cores=8)
#### Database connection ####
con <- weda::weda_connect(password = keyring::key_get(service = "ari-dev-weda-psql-01",
username = "psql_user"), username = "psql_user")
Additional functions used in the data preparation, modelling and analysis are available in the /functions directory.
# Source internal functions
sapply(list.files("functions", full.names = T), source, verbose = F)
Wrangle and format data for the STAN models for the various species.
Outline which species should be modeled, and which projects to source data from.
# Species to run model for.
deer_species_all <- c("Cervus unicolor", "Dama dama", "Cervus elaphus")
species_names <- c("Sambar Deer", "Fallow Deer", "Red Deer")
# Projects to select.
project_short_name <- c("hog_deer_2023", "StatewideDeer")
# Buffer for data extraction.
spatial_buffer <- 1000
# Covariate Rasters
raster_files <- "/Volumes/DeerVic\ Photos/Processed_Rasters"
# raster_files <- "data/prediction_raster"
prediction_raster <- "data/prediction_raster/statewide_raster.tif"
# For the integrated model we place limits on the maximum density of deer to integrate over. In cases where there are no detections on the camera this is limited to 15 per km2. In cases where deer were detetected on the camera this is expanded to 50. We believe this is sufficiently high. Very high values of these will be less efficient.
n_max_no_det <- 15
n_max_det <- 30
Download the camera locations from the database, this table outlines the locations and the deployment history of the cameras.
cams_curated <- tbl(con, dbplyr::in_schema("camtrap", "curated_camtrap_operation")) %>%
dplyr::filter(ProjectShortName %in% !!project_short_name) %>% # Only retrieve cam records for hog deer 2023
dplyr::collect() %>%
dplyr::arrange(SiteID) %>%
sf::st_as_sf(., coords = c("Longitude", "Latitude"), crs = 4283)
n_site <- nrow(cams_curated)
Here we outline formulas to be used in the models. The formulas account for the various fixed-effect parameters.
#### Model formulas ####
#### Transect Formula: Survey Only ####
transect_formula <- ~Survey
#### Abundance Formula Options ####
ab_formula_1 <- ~ scale(sqrt(BareSoil)) + scale(sqrt(Nitrogen)) + scale(log(PastureDistance)) + scale(BIO15) + scale(sqrt(ForestEdge)) +
scale(sqrt(MRVBF)) + scale(sqrt(SLOPE)) + scale(sqrt(NonPhotosyntheticVeg))
# Simple Formula
ab_formula_2 <- ~ scale(sqrt(BareSoil)) + scale(sqrt(Nitrogen)) + scale(log(PastureDistance)) + scale(TreeDensity) + scale(sqrt(ForestEdge))
# Simple with Coastal Distance: Hog Deer
ab_formula_3 <- ~ scale(sqrt(BareSoil)) + scale(sqrt(Nitrogen)) + scale(log(PastureDistance)) + scale(TreeDensity) + scale(sqrt(ForestEdge)) + scale(dist_coast)
ab_formula_4 <- ~ scale(sqrt(BareSoil)) + scale(sqrt(Nitrogen)) + scale(log(PastureDistance)) + scale(BIO15) + scale(sqrt(ForestEdge))
ab_formula_5 <- ~ scale(sqrt(BareSoil)) + scale(sqrt(Nitrogen)) + scale(log(PastureDistance)) + scale(BIO15) + scale(sqrt(ForestEdge)) +
scale(sqrt(MRVBF))
#### Detection Formula: Distance-sampling ####
det_formula <- ~ scale(HerbaceousUnderstoryCover) # average detection across all sites #scale(HerbaceousUnderstoryCover)
det_formula2 <- ~1
Using the prepare_model_data() function we generate the data for the STAN model. This function will:
1. Download data from the database 2. Format that data to match the distance-sampling bins
3. Organise the counts into sites and group sizes
4. Generate model matrix of the various submodels (distance-sampling, abundance and transect) 5. Generate data for the prediction process 6. Generate data for the random effect (bioregion)
7. Generate data for regional predictions (indexing based on DEECA regions)
if(!file.exists("data/multispecies_data.rds")) {
multispecies_data <- prepare_model_data_multispecies(species = c("Cervus unicolor",
"Dama dama",
"Cervus elaphus"),
projects = project_short_name,
buffer = spatial_buffer,
detection_formula = det_formula,
abundance_formula = ab_formula_4,
transect_formula = transect_formula,
con = con,
raster_dir = raster_files,
prediction_raster = prediction_raster,
n_max_no_det = n_max_no_det,
n_max_det = n_max_det,
evaltransects = TRUE,
snapshot_interval = 2,
hs_df = 1,
hs_df_global = 1,
hs_scale_global = 1,
hs_scale_slab = 0.5,
hs_df_slab = 4)
multispecies_data$keyfun <- 0 # 0 = HN
multispecies_data$raw_data[is.na(multispecies_data$raw_data)] <- 0
multispecies_data$transects[is.na(multispecies_data$transects)] <- 0
saveRDS(multispecies_data, "data/multispecies_data.rds")
} else {
multispecies_data <- readRDS("data/multispecies_data.rds")
}
Below we list the MCMC setting for our model. We run models on eight parallel chains for 1,000 iterations each (300 warmup and 300 sampling). These setting provide us with 4,000 posterior draws (8 x 500).
# STAN settings
ni <- 200 # sampling iterations
nw <- 200 # warmup iterations
nc <- 8 # number of chains
These are the models used in the analysis. The first is an integrated model that requires transect and camera data. The second is a count-only model that just requires the camera data.
functions {
/* Half-normal function
* Args:
* sigma: sigma
* midpoints: midpoints
* Returns:
* detection probability
*/
vector halfnorm(real sigma, vector midpoints) {
int bins = rows(midpoints);
vector[bins] p_raw; // detection probability
p_raw = exp(-(midpoints)^2/(2*sigma^2));
return p_raw;
}
/* Hazard function
* Args:
* sigma: sigma
* theta: theta
* midpoints: midpoints
* Returns:
* detection probability
*/
vector hazard(real sigma, real theta, vector midpoints) {
int bins = rows(midpoints);
vector[bins] p_raw; // detection probability
p_raw = 1 - exp(-(midpoints/sigma)^(-theta));
return p_raw;
}
vector prob_dist(real sigma, real theta, int keyfun, vector midpoints){
int bins = rows(midpoints);
vector[bins] out; // detection probability
if(keyfun == 0){
out = halfnorm(sigma, midpoints);
} else if(keyfun == 1){
out = hazard(sigma, theta, midpoints);
}
return out;
}
}
data {
int<lower=0> N; // number of observations
int<lower=0> S; // number of species
real delta; // bin width
int<lower=1> n_site; // site
int<lower=1> n_distance_bins; // distance bins
int<lower=1> n_gs; // number of group sizes
vector[n_gs] gs; // group sizes
vector[n_distance_bins] midpts; // midpt of each interval
real<lower=1> max_distance; // truncation distance
int<lower=1> max_int_dist; // max distance as an integer
real<lower=0> theta_frac; // fraction of camera view
array[n_site] int effort; // effort
array[n_site, n_gs, S] int n_obs; //number of observations
array[n_site, n_distance_bins, n_gs] int y; // observations matrix
// summary of whether species is known to be present at each site
int<lower = 0, upper = 1> any_seen[S, n_site];
// number of surveys at each site
int<lower = 0> n_survey[n_site];
// availability prior information
real<lower=0> bshape; // availability shape
real<lower=0> bscale; // availability scale
// camera trap detection parameters
int<lower=0> det_ncb; // number of covariates for detection model
matrix[n_site, det_ncb] det_model_matrix; // detection model matrix
array[n_gs, n_distance_bins] real pa ; // averaged availability for multiple individuals
// Abundance/occupancy model matrix
int<lower = 1> m_psi;
matrix[n_site, m_psi] X_psi;
// negbinom scale
// real reciprocal_phi_scale;
//transect level information
int<lower=1> trans; // total number of transects across all sites for all methods
array[S, trans] int<lower = 0, upper = 1> y2; // transect based binomial detections
int<lower = 0, upper = trans> start_idx[n_site];
int<lower = 0, upper = trans> end_idx[n_site];
int<lower=1> trans_det_ncb; // number of covariates for transect detection model
matrix[trans, trans_det_ncb] trans_pred_matrix; // transect detection model matrix
int<lower=1> n_max[n_site, S]; // max for poisson RN
// Prediction data
int<lower=1> npc; // number of prediction grid cells
matrix[npc, m_psi] X_pred_psi; // pred matrix
vector[npc] prop_pred; //offset
// bioregion RE
int<lower=1> np_bioreg;
int<lower=1> site_bioreg[n_site];
int<lower=1> pred_bioreg[npc];
// region data
int<lower=1> np_reg;
int<lower=1> site_reg[n_site];
int<lower=1> pred_reg[npc];
// key function, 0 = halfnorm
int keyfun;
}
transformed data {
// vector[n_distance_bins] pi; // availability for point
vector[n_site] survey_area;
array[S, n_site] real cam_seen;
for(i in 1:n_site) {
survey_area[i] = theta_frac * effort[i] * pi() * (max_distance/1000)^2;
for(s in 1:S) {
cam_seen[s,i] = sum(n_obs[i,,s]);
}
}
}
parameters {
// abundance parameters
//array[n_site] simplex[n_gs] eps_ngs; // random variation in group size
array[S] vector[m_psi] beta_psi;
vector[det_ncb] beta_det;
real log_theta;
// transect detection parameters
vector[trans_det_ncb] beta_trans_det;
// temporal availability parameters
real<lower=0, upper=1> activ;
// bioregion RE
array[S] real<lower=0> bioregion_sd;
array[S] vector[np_bioreg] bioregion_raw;
// eps group size params
array[S] vector[n_gs] zeta;
array[S] matrix[n_site, n_gs] eps_raw;
array[S] real<lower=0> grp_sd;
}
transformed parameters {
// random effects
array[S] vector[np_bioreg] eps_bioregion; // bioregion random effect
// distance parameters
array[n_site] real log_sigma;
array[n_site] real sigma;
array[n_site] vector[n_distance_bins] p_raw; // detection probability
array[n_site, n_distance_bins, n_gs] real p_raw_scale; // detection probability for point independence model and accounting for availability
array[n_site, n_distance_bins, n_gs] real<upper=0> log_p_raw; // log detection probability accounting for availability
array[n_site, n_gs] real log_p; // log probability with summed exponential for the multinomial logit model
array[n_site, n_gs] real<lower=0,upper=1> p; // log probability with summed exponential for the multinomial logit model
// abundance parameters
array[S, n_site, n_gs] real<lower=0> lambda;
// activity parameters
real log_activ = log(activ);
vector[trans] logit_trans_p = trans_pred_matrix * beta_trans_det; // observation process model
// lp_site for RN model
array[S, n_site] real lp_site;
vector<lower=0,upper=1>[trans] r = inv_logit(logit_trans_p);
array[S] vector[n_site] log_lambda_psi;
// eps group size
array[S, n_site] simplex[n_gs] eps_ngs; // random variation in group size
array[S, n_site] vector[n_gs] epsi;
array[S] matrix[n_site, n_gs] eps_site;
real<lower=0> theta = exp(log_theta);
for(n in 1:n_site) {
log_sigma[n] = det_model_matrix[n,] * beta_det;
sigma[n] = exp(log_sigma[n]);
p_raw[n] = prob_dist(sigma[n], theta, keyfun, midpts);
for (i in 1:n_distance_bins) {
// assuming a half-normal detection fn from line
// p_raw[n,i] = exp(-(midpts[i])^2/(2*sigma[n]^2)); // half-normal function (pg 419 of AHM)
// hazard function
// p_raw[n,i] = 1 - exp(-(midpts[i]/theta)^(-sigma[n])); // hazard function (pg 507 of AHM)
for(j in 1:n_gs) {
p_raw_scale[n,i,j] = p_raw[n,i]*pa[j,i]; // pr(animal occurs and is detected in bin i)
log_p_raw[n,i,j] = log(p_raw_scale[n,i,j]);
}
}
for(s in 1:S) {
for(b in 1:np_bioreg) {
eps_bioregion[s,b] = bioregion_sd[s] * bioregion_raw[s,b];
}
log_lambda_psi[s,n] = X_psi[n,] * beta_psi[s] + eps_bioregion[s,site_bioreg[n]];
for(j in 1:n_gs) {
eps_site[s, n,j] = grp_sd[s] * eps_raw[s,n,j];
epsi[s,n,j] = exp(zeta[s,j] + eps_site[s,n,j]);
}
eps_ngs[s,n,] = epsi[s,n,]/sum(epsi[s,n,]);
}
for(j in 1:n_gs) {
log_p[n,j] = log_sum_exp(log_p_raw[n,,j]);
p[n,j] = exp(log_p[n,j]);
// model site abundance
for(s in 1:S) {
lambda[s,n,j] = exp(log_lambda_psi[s,n] + log_p[n,j] + log_activ + log(eps_ngs[s,n,j]) + log(r[start_idx[n]])) .* survey_area[n];
}
}
// Royle-Nichols implementation in STAN (looping over possible discrete values of N)
// https://discourse.mc-stan.org/t/royle-and-nichols/14150
// https://discourse.mc-stan.org/t/identifiability-across-levels-in-occupancy-model/5340/2
for(s in 1:S) {
if (n_survey[n] > 0) {
vector[n_max[n,s]] lp;
if(any_seen[s,n] == 0){ // not seen
lp[1] = log_sum_exp(poisson_log_lpmf(0 | log_lambda_psi[s,n]), poisson_log_lpmf(1 | log_lambda_psi[s,n]) + bernoulli_lpmf(y2[s,start_idx[n]:end_idx[n]] | r[start_idx[n]:end_idx[n]]));
} else {
lp[1] = poisson_log_lpmf(1 | log_lambda_psi[s,n]) + bernoulli_lpmf(y2[s,start_idx[n]:end_idx[n]] | r[start_idx[n]:end_idx[n]]);
}
for (k in 2:n_max[n,s]){
lp[k] = poisson_log_lpmf(k | log_lambda_psi[s,n]) + bernoulli_lpmf(y2[s,start_idx[n]:end_idx[n]] | 1-(1-r[start_idx[n]:end_idx[n]])^k);
}
lp_site[s,n] = log_sum_exp(lp);
} else {
lp_site[s, n] = 0;
}
}
}
}
model {
for(s in 1:S) {
beta_psi[s] ~ normal(0, 3); // prior for intercept in poisson model
bioregion_sd[s] ~ normal(0,2);
bioregion_raw[s,] ~ normal(0,1);
to_vector(eps_raw[s,,]) ~ std_normal();
grp_sd[s] ~ normal(0, 1);
zeta[s,] ~ normal(0, 2);
}
beta_trans_det ~ normal(0, 2); // beta for transect detection
beta_det ~ normal(0, 4); // prior for sigma
activ ~ beta(bshape, bscale); //informative prior
log_theta ~ normal(0,2); // prior for theta
for(n in 1:n_site) {
for(j in 1:n_gs) {
for(s in 1:S) {
target += poisson_lpmf(n_obs[n,j,s] | lambda[s,n,j]);
}
y[n,,j] ~ multinomial_logit(to_vector(log_p_raw[n,,j]));
}
for(s in 1:S) {
target += lp_site[s, n];
}
}
}
generated quantities {
array[S, n_site,n_gs] real n_obs_pred;
array[S, n_site, n_gs] real n_obs_true;
array[S, n_site] real N_site;
array[S, n_site] real N_site_pred;
array[n_site, max_int_dist+1] real DetCurve;
array[n_site, n_gs] real log_lik1;
array[S, n_site, n_gs] real log_lik2;
array[n_site, n_gs] real log_lik2_site;
array[n_site] real log_lik;
//array[S, n_site] real Site_lambda;
real av_gs[S];
array[S] simplex[n_gs] eps_gs_ave;
array[S, npc] real pred;
//array[S, npc] real pred_trunc;
array[S, np_reg] real Nhat_reg;
// array[np_reg] real Nhat_reg_design;
real Nhat[S];
//real Nhat_trunc[S];
real Nhat_sum;
int trunc_counter[S];
trunc_counter[S] = 0;
for(s in 1:S) {
eps_gs_ave[s] = exp(zeta[s])/sum(exp(zeta[s]));
av_gs[s] = sum(gs .* eps_gs_ave[s]); //average group size
}
for(n in 1:n_site) {
for(j in 1:n_gs) {
log_lik1[n,j] = multinomial_logit_lpmf(y[n,,j] | to_vector(log_p_raw[n,,j])); //for loo
for(s in 1:S) {
log_lik2[s,n,j] = poisson_lpmf(n_obs[n,j,s] | lambda[s,n,j]); //for loo
n_obs_true[s, n, j] = gs[j] * (poisson_log_rng(log_lambda_psi[s,n] + log(eps_ngs[s,n,j])));
n_obs_pred[s, n,j] = gs[j] * poisson_rng(lambda[s,n,j]);
}
log_lik2_site[n, j] = log_sum_exp(log_lik2[,n,j]);
}
// get loglik on a site level
log_lik[n] = log_sum_exp(log_sum_exp(log_sum_exp(log_lik1[n,]),
log_sum_exp(log_lik2_site[n,])), log_sum_exp(lp_site[,n]));
for(s in 1:S) {
//Site_lambda[s,n] = exp(log_lambda_psi[s,n]);
N_site[s,n] = sum(n_obs_true[s,n,]);
N_site_pred[s,n] = sum(n_obs_pred[s,n,]);
}
// loop over distance bins
if(keyfun == 0) {
for(j in 0:max_int_dist) { // get DS predictions for distance 0 to max bin distance
DetCurve[n, j+1] = exp(-(j+0.5)^2/(2*sigma[n]^2)); // half normal
}
} else if(keyfun == 1) {
for(j in 0:max_int_dist) { // get DS predictions for distance 0 to max bin distance
DetCurve[n, j+1] = 1 - exp(-(j+0.5/sigma[n])^(-theta)); //hazard rate
}
}
}
for(s in 1:S) {
for(i in 1:np_reg) Nhat_reg[s,i] = 0;
for(i in 1:npc) {
pred[s,i] = poisson_log_rng(X_pred_psi[i,] * beta_psi[s] + eps_bioregion[s, pred_bioreg[i]]) * prop_pred[i] * av_gs[s]; //offset
if(pred[s,i] > max(N_site[s,])) {
//pred_trunc[s,i] = max(N_site[s,]);
trunc_counter[s] += 1;
}
Nhat_reg[s,pred_reg[i]] += pred[s,i];
}
Nhat[s] = sum(pred[s,]);
//Nhat_trunc[s] = sum(pred_trunc[s,]);
}
Nhat_sum = sum(Nhat);
}functions {
/* Half-normal function
* Args:
* sigma: sigma
* midpoints: midpoints
* Returns:
* detection probability
*/
vector halfnorm(real sigma, vector midpoints) {
int bins = rows(midpoints);
vector[bins] p_raw; // detection probability
p_raw = exp(-(midpoints)^2/(2*sigma^2));
return p_raw;
}
/* Hazard function
* Args:
* sigma: sigma
* theta: theta
* midpoints: midpoints
* Returns:
* detection probability
*/
vector hazard(real sigma, real theta, vector midpoints) {
int bins = rows(midpoints);
vector[bins] p_raw; // detection probability
p_raw = 1 - exp(-(midpoints/sigma)^(-theta));
return p_raw;
}
vector prob_dist(real sigma, real theta, int keyfun, vector midpoints){
int bins = rows(midpoints);
vector[bins] out; // detection probability
if(keyfun == 0){
out = halfnorm(sigma, midpoints);
} else if(keyfun == 1){
out = hazard(sigma, theta, midpoints);
}
return out;
}
}
data {
int<lower=0> N; // number of observations
int<lower=0> S; // number of species
real delta; // bin width
int<lower=1> n_site; // site
int<lower=1> n_distance_bins; // distance bins
int<lower=1> n_gs; // number of group sizes
vector[n_gs] gs; // group sizes
vector[n_distance_bins] midpts; // midpt of each interval
real<lower=1> max_distance; // truncation distance
int<lower=1> max_int_dist; // max distance as an integer
real<lower=0> theta_frac; // fraction of camera view
array[n_site] int effort; // effort
array[n_site, n_gs, S] int n_obs; //number of observations
array[n_site, n_distance_bins, n_gs] int y; // observations matrix
// summary of whether species is known to be present at each site
int<lower = 0, upper = 1> any_seen[S, n_site];
// number of surveys at each site
int<lower = 0> n_survey[n_site];
// availability prior information
real<lower=0> bshape; // availability shape
real<lower=0> bscale; // availability scale
// camera trap detection parameters
int<lower=0> det_ncb; // number of covariates for detection model
matrix[n_site, det_ncb] det_model_matrix; // detection model matrix
array[n_gs, n_distance_bins] real pa ; // averaged availability for multiple individuals
// Abundance/occupancy model matrix
int<lower = 1> m_psi;
matrix[n_site, m_psi] X_psi;
// negbinom scale
// real reciprocal_phi_scale;
//transect level information
int<lower=1> trans; // total number of transects across all sites for all methods
array[S, trans] int<lower = 0, upper = 1> y2; // transect based binomial detections
int<lower = 0, upper = trans> start_idx[n_site];
int<lower = 0, upper = trans> end_idx[n_site];
int<lower=1> trans_det_ncb; // number of covariates for transect detection model
matrix[trans, trans_det_ncb] trans_pred_matrix; // transect detection model matrix
int<lower=1> n_max[n_site, S]; // max for poisson RN
// Prediction data
int<lower=1> npc; // number of prediction grid cells
matrix[npc, m_psi] X_pred_psi; // pred matrix
vector[npc] prop_pred; //offset
// bioregion RE
int<lower=1> np_bioreg;
int<lower=1> site_bioreg[n_site];
int<lower=1> pred_bioreg[npc];
// region data
int<lower=1> np_reg;
int<lower=1> site_reg[n_site];
int<lower=1> pred_reg[npc];
// key function, 0 = halfnorm
int keyfun;
}
transformed data {
// vector[n_distance_bins] pi; // availability for point
vector[n_site] survey_area;
array[S, n_site] real cam_seen;
for(i in 1:n_site) {
survey_area[i] = theta_frac * effort[i] * pi() * (max_distance/1000)^2;
for(s in 1:S) {
cam_seen[s,i] = sum(n_obs[i,,s]);
}
}
}
parameters {
// abundance parameters
//array[n_site] simplex[n_gs] eps_ngs; // random variation in group size
array[S] vector[m_psi] beta_psi;
vector[det_ncb] beta_det;
real log_theta;
// transect detection parameters
vector[trans_det_ncb] beta_trans_det;
// temporal availability parameters
real<lower=0, upper=1> activ;
// bioregion RE
// array[S] real<lower=0> bioregion_sd;
// array[S] vector[np_bioreg] bioregion_raw;
// bioregion RE ZIP
array[S] real bioregion_mu_zip;
array[S] real<lower=0> bioregion_sd_zip;
array[S] vector[np_bioreg] bioregion_raw_zip;
// eps group size params
array[S] vector[n_gs] zeta;
array[S] matrix[n_site, n_gs] eps_raw;
array[S] real<lower=0> grp_sd;
}
transformed parameters {
// random effects
// array[S] vector[np_bioreg] eps_bioregion; // bioregion random effect
array[S] vector[np_bioreg] eps_bioregion_zip; // bioregion random effect
// distance parameters
array[n_site] real log_sigma;
array[n_site] real sigma;
array[n_site] vector[n_distance_bins] p_raw; // detection probability
array[n_site, n_distance_bins, n_gs] real p_raw_scale; // detection probability for point independence model and accounting for availability
array[n_site, n_distance_bins, n_gs] real<upper=0> log_p_raw; // log detection probability accounting for availability
array[n_site, n_gs] real log_p; // log probability with summed exponential for the multinomial logit model
array[n_site, n_gs] real<lower=0,upper=1> p; // log probability with summed exponential for the multinomial logit model
// abundance parameters
array[S, n_site, n_gs] real<lower=0> lambda;
// activity parameters
real log_activ = log(activ);
vector[trans] logit_trans_p = trans_pred_matrix * beta_trans_det; // observation process model
// lp_site for RN model
array[S, n_site] real lp_site;
vector<lower=0,upper=1>[trans] r = inv_logit(logit_trans_p);
array[S] vector[n_site] log_lambda_psi;
// eps group size
array[S, n_site] simplex[n_gs] eps_ngs; // random variation in group size
array[S, n_site] vector[n_gs] epsi;
array[S] matrix[n_site, n_gs] eps_site;
real<lower=0> theta = exp(log_theta);
array[S] vector[n_site] logit_phi;
array[S] vector<lower=0, upper=1>[n_site] phi;
for(s in 1:S) {
for(b in 1:np_bioreg) {
// eps_bioregion[s,b] = bioregion_sd[s] * bioregion_raw[s,b];
eps_bioregion_zip[s,b] = bioregion_mu_zip[s] + bioregion_sd_zip[s] * bioregion_raw_zip[s,b];
}
}
for(n in 1:n_site) {
log_sigma[n] = det_model_matrix[n,] * beta_det;
sigma[n] = exp(log_sigma[n]);
p_raw[n] = prob_dist(sigma[n], theta, keyfun, midpts);
for (i in 1:n_distance_bins) {
// assuming a half-normal detection fn from line
// p_raw[n,i] = exp(-(midpts[i])^2/(2*sigma[n]^2)); // half-normal function (pg 419 of AHM)
// hazard function
// p_raw[n,i] = 1 - exp(-(midpts[i]/theta)^(-sigma[n])); // hazard function (pg 507 of AHM)
for(j in 1:n_gs) {
p_raw_scale[n,i,j] = p_raw[n,i]*pa[j,i]; // pr(animal occurs and is detected in bin i)
log_p_raw[n,i,j] = log(p_raw_scale[n,i,j]);
}
}
for(s in 1:S) {
// define log lambda
logit_phi[s,n] = eps_bioregion_zip[s,site_bioreg[n]];
phi[s,n] = inv_logit(logit_phi[s,n]);
log_lambda_psi[s,n] = X_psi[n,] * beta_psi[s]; // + eps_bioregion[s,site_bioreg[n]];
for(j in 1:n_gs) {
eps_site[s, n,j] = grp_sd[s] * eps_raw[s,n,j];
epsi[s,n,j] = exp(zeta[s,j] + eps_site[s,n,j]);
}
eps_ngs[s,n,] = epsi[s,n,]/sum(epsi[s,n,]);
}
for(j in 1:n_gs) {
log_p[n,j] = log_sum_exp(log_p_raw[n,,j]);
p[n,j] = exp(log_p[n,j]);
// model site abundance
for(s in 1:S) {
lambda[s,n,j] = exp(log_lambda_psi[s,n] + log_p[n,j] + log_activ + log(eps_ngs[s,n,j]) + log(r[start_idx[n]])) .* survey_area[n];
}
}
// Royle-Nichols implementation in STAN (looping over possible discrete values of N)
// https://discourse.mc-stan.org/t/royle-and-nichols/14150
// https://discourse.mc-stan.org/t/identifiability-across-levels-in-occupancy-model/5340/2
for(s in 1:S) {
if (n_survey[n] > 0) {
vector[n_max[n,s]] lp;
if(any_seen[s,n] == 0){ // not seen
array[2] real lpx;
lpx[1] = log_sum_exp(log1m(phi[s,n]), log(phi[s,n]) + poisson_log_lpmf(0 | log_lambda_psi[s,n]));
lpx[2] = log(phi[s,n]) + poisson_log_lpmf(1 | log_lambda_psi[s,n]) + bernoulli_lpmf(y2[s,start_idx[n]:end_idx[n]] | r[start_idx[n]:end_idx[n]]);
lp[1] = log_sum_exp(lpx);
} else {
lp[1] = log(phi[s,n]) + poisson_log_lpmf(1 | log_lambda_psi[s,n]) + bernoulli_lpmf(y2[s,start_idx[n]:end_idx[n]] | r[start_idx[n]:end_idx[n]]);
}
for (k in 2:n_max[n,s]){
lp[k] = log(phi[s,n]) + poisson_log_lpmf(k | log_lambda_psi[s,n]) + bernoulli_lpmf(y2[s,start_idx[n]:end_idx[n]] | 1-(1-r[start_idx[n]:end_idx[n]])^k);
}
lp_site[s,n] = log_sum_exp(lp);
} else {
lp_site[s, n] = 0;
}
}
}
}
model {
for(s in 1:S) {
beta_psi[s] ~ normal(0, 1); // prior for poisson model
// bioregion_sd[s] ~ student_t(4, 0, 1);
// bioregion_raw[s,] ~ normal(0,1);
bioregion_mu_zip[s] ~ normal(0,2);
bioregion_sd_zip[s] ~ student_t(4, 0, 1);
bioregion_raw_zip[s,] ~ normal(0,1);
to_vector(eps_raw[s,,]) ~ std_normal();
grp_sd[s] ~ normal(0, 1);
zeta[s,] ~ normal(0, 2);
}
beta_trans_det ~ normal(0, 2); // beta for transect detection
beta_det ~ normal(0, 4); // prior for sigma
activ ~ beta(bshape, bscale); //informative prior
log_theta ~ normal(0,2); // prior for theta
for(n in 1:n_site) {
for(j in 1:n_gs) {
for(s in 1:S) {
if(any_seen[s,n] > 0) {
target += bernoulli_lpmf(1 | phi[s,n]) +
poisson_lpmf(n_obs[n,j,s] | lambda[s,n,j]);
} else {
target += log_sum_exp(bernoulli_lpmf(0 | phi[s,n]), bernoulli_lpmf(1 | phi[s,n]) +
poisson_lpmf(n_obs[n,j,s] | lambda[s,n,j]));
}
}
y[n,,j] ~ multinomial_logit(to_vector(log_p_raw[n,,j]));
}
for(s in 1:S) {
target += lp_site[s, n];
}
}
}
generated quantities {
array[S, n_site,n_gs] real n_obs_pred;
array[S, n_site, n_gs] real n_obs_true;
array[S, n_site] real N_site;
array[S, n_site] real N_site_pred;
array[n_site, max_int_dist+1] real DetCurve;
array[n_site, n_gs] real log_lik1;
array[S, n_site, n_gs] real log_lik2;
array[n_site, n_gs] real log_lik2_site;
array[n_site] real log_lik;
//array[S, n_site] real Site_lambda;
real av_gs[S];
array[S] simplex[n_gs] eps_gs_ave;
array[S, npc] real pred;
array[S, np_reg] real Nhat_reg;
// array[np_reg] real Nhat_reg_design;
real Nhat[S];
real Nhat_sum;
int trunc_counter[S];
trunc_counter[S] = 0;
//array[S] vector<lower=0,upper=1>[npc] phi_pred;
for(s in 1:S) {
eps_gs_ave[s] = exp(zeta[s])/sum(exp(zeta[s]));
av_gs[s] = sum(gs .* eps_gs_ave[s]); //average group size
}
for(n in 1:n_site) {
for(j in 1:n_gs) {
log_lik1[n,j] = multinomial_logit_lpmf(y[n,,j] | to_vector(log_p_raw[n,,j])); //for loo
for(s in 1:S) {
log_lik2[s,n,j] = poisson_lpmf(n_obs[n,j,s] | lambda[s,n,j]); //for loo
n_obs_true[s, n, j] = gs[j] * (poisson_log_rng(log_lambda_psi[s,n] + log(eps_ngs[s,n,j])));
n_obs_pred[s, n, j] = gs[j] * poisson_rng(lambda[s,n,j]);
}
log_lik2_site[n, j] = log_sum_exp(log_lik2[,n,j]);
}
// get loglik on a site level
log_lik[n] = log_sum_exp(log_sum_exp(log_sum_exp(log_lik1[n,]),
log_sum_exp(log_lik2_site[n,])), log_sum_exp(lp_site[,n]));
for(s in 1:S) {
real log_p_unobs;
//Site_lambda[s,n] = exp(log_lambda_psi[s,n]);
N_site[s,n] = sum(n_obs_true[s,n,]);
N_site_pred[s,n] = sum(n_obs_pred[s,n,]);
// account for zero inflation
if(any_seen[s,n] == 0) {
if(bernoulli_rng(phi[s,n])==0)
N_site[s,n] = 0;
log_p_unobs = log1m(phi[s,n]) + poisson_log_lpmf(0 | log_lambda_psi[s,n]);
//log_p_unobs = log(phi[s,n]) + poisson_log_lpmf(0 | log_lambda_psi[s,n]);
if(bernoulli_rng(exp(log_p_unobs))==0)
N_site_pred[s,n] = 0;
}
}
// loop over distance bins
if(keyfun == 0) {
for(j in 0:max_int_dist) { // get DS predictions for distance 0 to max bin distance
DetCurve[n, j+1] = exp(-(j+0.5)^2/(2*sigma[n]^2)); // half normal
}
} else if(keyfun == 1) {
for(j in 0:max_int_dist) { // get DS predictions for distance 0 to max bin distance
DetCurve[n, j+1] = 1 - exp(-(j+0.5/sigma[n])^(-theta)); //hazard rate
}
}
}
for(s in 1:S) {
for(i in 1:np_reg) Nhat_reg[s,i] = 0;
for(i in 1:npc) {
if(bernoulli_logit_rng(eps_bioregion_zip[s, pred_bioreg[i]]) == 1) {
pred[s,i] = poisson_log_rng(X_pred_psi[i,] * beta_psi[s]) * prop_pred[i] * av_gs[s]; //offset
} else {
pred[s,i] = 0;
}
if(pred[s,i] > max(N_site[s,])) {
trunc_counter[s] += 1;
} // upper limit placed at highest site estimate
Nhat_reg[s,pred_reg[i]] += pred[s,i];
}
Nhat[s] = sum(pred[s,]);
}
Nhat_sum = sum(Nhat);
}model_poisson_ms <- cmdstan_model(here::here("stan", "count_det_nondet_poisson_ms.stan"))
model_poisson_ms_zip <- cmdstan_model(here::here("stan", "count_det_nondet_poisson_ms_zip.stan"))
multispecies_data$keyfun <- 0
multispecies_model_zip <- model_poisson_ms_zip$sample(data = multispecies_data,
chains = 8,
parallel_chains = 8,
init = 0.1,
max_treedepth = 10,
refresh = 20,
show_messages = TRUE,
save_warmup = FALSE,
iter_sampling = 200,
iter_warmup = 200)
multispecies_model_zip$save_object("outputs/models/multispecies_model_zip.rds")
multispecies_model <- model_poisson_ms$sample(data = multispecies_data,
chains = 8,
parallel_chains = 8,
init = 0.1,
max_treedepth = 10,
refresh = 20,
show_messages = TRUE,
save_warmup = FALSE,
iter_sampling = 200,
iter_warmup = 200)
multispecies_model$save_object("outputs/models/multispecies_model.rds")
We can fit models using cmdstanr. Here we fit models using a poisson distribution. To aid convergence adapt_delta is increased from the default value. The models we compare are:
integrated_model <- c(TRUE, TRUE, TRUE, FALSE)
distributions <- c("poisson")
model_fits <- list()
for(j in 1:length(distributions)) {
for(i in 1:length(deer_species_all)) {
if(integrated_model[i]) {
model_to_fit <- get(paste0("model_", distributions[j]))
} else {
model_to_fit <- get(paste0("model_", distributions[j], "_co"))
}
model_fits[[i]] <- model_to_fit$sample(data = model_data[[i]],
chains = nc,
parallel_chains = nc,
init = 0.1,
max_treedepth = 10,
refresh = 20,
step_size = 0.01,
adapt_delta = 0.99,
show_messages = TRUE,
save_warmup = FALSE,
iter_sampling = ni,
iter_warmup = nw)
model_fits[[i]]$save_object(paste0("outputs/models/fit_",
distributions[j],
"_",
deer_species_all[i],".rds"))
}
}
multispecies_model_zip <- readRDS("outputs/models/multispecies_model_zip.rds")
Our strategy for model evaluation is to determine if the models of the sparse regression approach (horseshoe-prior regularised model) preform sufficiently well against a null model (no abundance fixed-effect covariates).
We use leave-one-out cross-validation (LOO-CV); a Bayesian model evaluation process (Vehtari, Gelman, and Gabry 2017; Vehtari et al. 2020) to compare the model against a null model. As long as the model with ‘all’ parameters performs better than the null models it is suitable to use it exclusively for all analyses and predictions. Importantly, the cross-validation also gives us the ability to compare the predictive performance against a null model. In doing so we see that the model with abundance covariates performs better than a null model. Noting that the null model will still include a random bioregion effect.
loo_compare_table <- loo::loo_compare(multispecies_models[[1]]$loo(),
multispecies_models[[2]]$loo())
# Plot loo table
gt(loo_compare_table %>%
as.data.frame()) %>%
fmt_number(decimals = 2) %>%
tab_style(locations = cells_row_groups(), style = cell_text(weight = "bold"))
Posterior predictive checks allow us to compare the observed data to the model-generated data. For each species we undertake posterior predictive checks for summary statistics relating to the number of deer seen on the cameras at each site. Ideally a well-fit model is able to make predictions that match the observed data. The summary statistics we use for the posterior predictive checks are:
q95 <- function(x) quantile(x, 0.95, na.rm = T)
# q25 <- function(x) quantile(x, 0.25, na.rm = T)
q75 <- function(x) quantile(x, 0.75, na.rm = T)
sd_90 <- function (x, na.rm = FALSE) {
quants <- quantile(x, c(0.05, 0.95), na.rm = T)
x <- x[x < quants[2] & x > quants[1]]
sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
na.rm = na.rm))
}
funs <- c(max, mean, sd, ppc_scatter_avg, prop_zero, ppc_rootogram)
titles <- c("Max", "Mean", "SD", "Average Site Counts", "Proportion Zeros", "Histogram of Counts")
ppc_sambar <- list()
# funs <- c(prop_zero, mean, q90, sd)
for(i in 1:length(funs)) {
ppc_sambar[[i]] <- posterior_checks_multispecies(model = multispecies_model_zip,
model_data = multispecies_data, species_index = 1,
stat = funs[[i]], integrated = T,
title = titles[i])
}
cowplot::plot_grid(plotlist = ppc_sambar, labels = "AUTO", ncol = 2)
Figure 1: Posterior predictive checks for Sambar Deer
ppc_fallow <- list()
for(i in 1:length(funs)) {
ppc_fallow[[i]] <- posterior_checks_multispecies(model = multispecies_model_zip,
model_data = multispecies_data, species_index = 2,
stat = funs[[i]], integrated = T,
title = titles[i])
}
cowplot::plot_grid(plotlist = ppc_fallow, labels = "AUTO", ncol = 2)
Figure 2: Posterior predictive checks for Fallow Deer
Figure 3: Posterior predictive checks for Red Deer
Within the STAN model we generate predictions for sampled and unsampled locations. This provides us with site-level abundance estimates as well as estimates across all (unsampled) public forest.
Visualisation of point-estimates for the various deer species surveyed for in this study.
vic_regions <- vicmap_query("open-data-platform:delwp_region") %>%
collect() %>%
st_transform(3111) %>%
st_simplify(dTolerance = 500)
site_preds <- function(model, cams_curated, species_index) {
rn_dens <- model$summary("N_site")
which_sp <- which(stringr::str_detect(string = rn_dens$variable,
pattern = paste0("N_site\\[",
species_index)))
density_at_sites_rn <- cbind(cams_curated, rn_dens[which_sp, ])
return(density_at_sites_rn)
}
site_density_plot <- function(densities, regions, species) {
densities$density <- cut(densities$mean,
breaks = c(0, 0.1, 1, 3, 5, 10, max(densities$mean)),
labels = c("<0.1", "0.1 - 1", "1-3", "3 - 5", "5 - 10", "10+"), include.lowest = T, right = T)
if(species == "Hog") {
regions <- regions %>% filter(delwp_region == "GIPPSLAND")
densities <- densities %>% st_filter(regions %>% st_transform(4283))
}
plot <- ggplot2::ggplot(data = densities) +
ggplot2::geom_sf(data = regions, alpha = 0.75, fill = "grey80") +
ggplot2::geom_sf(aes(fill = density, alpha = mean), shape = 21, size = 4) +
scale_fill_viridis_d(name = "", guide = guide_legend(override.aes = list(size = 6))) +
scale_alpha_continuous(range = c(0.5,1), guide = "none") +
labs(title = paste0('Density of ',species ,' Deer'), fill = bquote('Deer per'~km^2)) +
theme_bw() +
theme(legend.text = element_text(size = 18), legend.key.size = unit(1, "cm"),
title = element_text(size = 22))
return(plot)
}
sambar_preds <- site_preds(multispecies_model_zip, cams_curated = cams_curated, species_index = 1)
fallow_preds <- site_preds(multispecies_model_zip, cams_curated = cams_curated, species_index = 2)
red_preds <- site_preds(multispecies_model_zip, cams_curated = cams_curated, species_index = 3)
cowplot::plot_grid(site_density_plot(sambar_preds, vic_regions, species = "Sambar"),
site_density_plot(fallow_preds, vic_regions, species = "Fallow"),
site_density_plot(red_preds, vic_regions, species = "Red"), ncol = 1)
Figure 4: Site-level density estimates for all sites sampled as part of statewide and hog deer surveys
As a sanity-check we compare the average modelled densities at sites with (i) no evidence of deer, (ii) evidence of deer present on transects, and (iii) evidence of deer present on cameras. We expect that average densities are generally higher at sites that detected some form of deer than sites that did not detect any sign of deer. Additionally, we would also expect average densities to be generally higher at sites that had detections on cameras than those with only detections from transects. The table below shows these expectations to be correct for Sambar and partially for Fallow. Red Deer had too few detections for this check to be comprehensive but alongside Hog Deer, Red Deer density was higher at sites with detections on the camera than those where they weren’t seen.
density_summary_table <- function(preds, model_data, species, species_index) {
cam_seen <- as.integer(as.logical(rowSums(model_data$n_obs[,,species_index])))
preds_sum <- preds %>%
st_drop_geometry() %>%
mutate(`Species` = species,
`CameraDetection` = cam_seen,
`AnyDetection` = model_data$any_seen[species_index, ],
`Detection` = case_when(CameraDetection == 0 & AnyDetection == 0 ~ "Not seen",
CameraDetection == 0 & AnyDetection == 1 ~ "Only detected on transects",
CameraDetection == 1 & AnyDetection == 1 ~ "Seen on cameras")) %>%
group_by(`Species`, `Detection`) %>%
summarise(`Number of Sites` = n(),
`Average Density` = mean(mean)) %>%
ungroup()
return(preds_sum)
}
density_summary <- bind_rows(
density_summary_table(sambar_preds, multispecies_data,
species = "Sambar", species_index = 1),
density_summary_table(fallow_preds, multispecies_data,
species = "Fallow", species_index = 2),
density_summary_table(red_preds, multispecies_data,
species = "Red", species_index = 3))
density_summary %>%
kbl(format = "html", digits = 2) %>%
kable_styling("striped") %>%
collapse_rows(1)
| Species | Detection | Number of Sites | Average Density |
|---|---|---|---|
| Sambar | Not seen | 180 | 1.01 |
| Only detected on transects | 33 | 2.08 | |
| Seen on cameras | 104 | 2.78 | |
| Fallow | Not seen | 253 | 1.08 |
| Only detected on transects | 34 | 1.86 | |
| Seen on cameras | 30 | 6.17 | |
| Red | Not seen | 304 | 0.18 |
| Only detected on transects | 1 | 0.81 | |
| Seen on cameras | 12 | 1.22 |
Within our model we calculate abundance/density for deer in each of the 73323 km2 of public land. Based on these spatial predictions we can estimate abundance at a regional level (6 DEECA regions) and across the whole state.
Using the model predictions (“pred”) for all suitable public land we generate a raster (1km2 resolution). We save the average spatial estimates under outputs/rasters (tif files) and also provide binned plots (png files) in outputs/plots.
pred_raster_full <- terra::rast(prediction_raster)
pred_raster <- terra::app(pred_raster_full[[stringr::str_subset(
stringr::str_remove_all(labels(terms(ab_formula_4)),
"scale[(]|[)]|log[(]|sqrt[(]"),
pattern = "[*]", negate = T)]], mean)
mean_raster <- list()
mean_raster_discrete <- list()
gp_preds_draws_all <- multispecies_model_zip$draws("pred", format = "matrix")
for(i in 1:length(deer_species_all)) {
which_sp <- which(stringr::str_detect(string = colnames(gp_preds_draws_all),
pattern = paste0("pred\\[", i)))
gp_preds_draws_sp <- gp_preds_draws_all[,which_sp]
terra::values(pred_raster)[!is.na(terra::values(pred_raster))] <- apply(gp_preds_draws_sp,
2,
mean,
na.rm = T,
trim = 0.1)
mean_raster[[i]] <- pred_raster
mean_raster_discrete[[i]] <- mean_raster[[i]]
max_pred <- max(values(mean_raster[[i]]), na.rm = T)
values(mean_raster_discrete[[i]]) <- cut(values(mean_raster_discrete[[i]]) ,
breaks = c(0, 0.1,1,3,5,10, max_pred),
include.lowest = T, right = T,
labels = c("0 - 0.1",
"0.1 - 1",
"1 - 3",
"3 - 5",
"5 - 10",
"10 +"))
}
# combine mean rasters together
combined_raster <- rast(mean_raster)
names(combined_raster) <- c("Average Sambar Deer Density (per km2)",
"Average Fallow Deer Density (per km2)",
"Average Red Deer Density (per km2)")
writeRaster(combined_raster, "outputs/rasters/combined_deer_average_density.tif", overwrite = T)
# reg <- VicmapR::vicmap_query("open-data-platform:delwp_region") %>%
# VicmapR::collect() %>%
# sf::st_transform(3111) %>%
# sf::st_simplify(dTolerance = 250)
state <- VicmapR::vicmap_query("open-data-platform:vmlite_victoria_polygon_su5") %>%
filter(state != "NSW" & state != "SA" & feature_type_code != "sea") %>%
VicmapR::collect() %>%
sf::st_transform(3111)
gippsland <- vic_regions %>% filter(delwp_region == "GIPPSLAND")
plot_abundance <- function(raster, state, species, crop = NULL) {
if(!is.null(crop)) {
raster <- terra::crop(raster, vect(crop), mask = T)
state <- crop
}
plot <- ggplot2::ggplot() +
ggplot2::geom_sf(data = state, alpha = 0.5, linewidth = 0.5, fill = "grey80") +
tidyterra::geom_spatraster(data = raster, na.rm = T) +
tidyterra::scale_fill_terrain_d(na.translate = FALSE) +
ggplot2::labs(fill = bquote('Deer per'~km^2), title = paste0("Average Density of ", species, " on Victorian Public Land")) +
ggspatial::annotation_scale()
return(plot)
}
sambar_abundance_plot <- plot_abundance(mean_raster_discrete[[1]],
state = state,
species = "Sambar Deer")
fallow_abundance_plot <- plot_abundance(mean_raster_discrete[[2]],
state = state,
species = "Fallow Deer")
red_abundance_plot <- plot_abundance(mean_raster_discrete[[3]],
state = state,
species = "Red Deer")
ggsave(plot = sambar_abundance_plot, filename = "outputs/plots/sambar_abundance_plot.png",
width = 2400, height = 1600, units = "px")
ggsave(plot = fallow_abundance_plot, filename = "outputs/plots/fallow_abundance_plot.png",
width = 2400, height = 1600, units = "px")
ggsave(plot = red_abundance_plot, filename = "outputs/plots/red_abundance_plot.png",
width = 2400, height = 1600, units = "px")
cowplot::plot_grid(sambar_abundance_plot, fallow_abundance_plot, red_abundance_plot, ncol = 1)
Figure 5: Abundance of Sambar, Fallow, Red and Hog Deer across Victoria, dark-grey area reflects area not included in predictions (i.e. not public land)
For each DEECA region we provide species-level estimates of abundance with 90 % confidence interval. We also calculate the modle-based average density within each region based on the abundance and the total area of public land within each region.
reg <- VicmapR::vicmap_query("open-data-platform:delwp_region") %>%
VicmapR::collect() %>%
dplyr::group_by(delwp_region) %>%
dplyr::summarise(geometry = sf::st_combine(geometry)) %>%
dplyr::ungroup() %>%
dplyr::mutate(delwp_region_fact = as.integer(factor(delwp_region)))
abundance_table <- function(model, regions, pred_area, caption, species_index) {
reg <- model$summary("Nhat_reg")
which_sp <- which(stringr::str_detect(string = reg$variable,
pattern = paste0("Nhat_reg\\[", species_index)))
regional_abundance <- reg[which_sp,] %>%
dplyr::bind_rows(model$summary("Nhat")[species_index,]) %>%
dplyr::mutate(variable = c(regions$delwp_region, "TOTAL"),
`Area km2` = c(pred_area, sum(pred_area)),
`Average Density (per km2)` = round(mean/`Area km2`, 2)) %>%
dplyr::select(Region = variable,
Mean = mean,
Median = median,
SD = sd,
`5 %` = q5,
`90 %` = q95,
`Area km2`,
`Average Density (per km2)`)
kableExtra::kbl(regional_abundance, digits = 1, format = "html", caption = caption) %>%
kableExtra::kable_styling("striped") %>%
kableExtra::column_spec(1, bold = TRUE) %>%
kableExtra::row_spec(6, hline_after = T) %>%
kableExtra::row_spec(7, background = "#c2a5cf", color = "black", bold = T, hline_after = T)
}
abundance_table(multispecies_model_zip, regions = reg, pred_area = table(multispecies_data$pred_reg), caption = "Regional estimates of Sambar Deer Density", species_index = 1)
| Region | Mean | Median | SD | 5 % | 90 % | Area km2 | Average Density (per km2) |
|---|---|---|---|---|---|---|---|
| BARWON SOUTH WEST | 265.7 | 228.3 | 147.3 | 100.0 | 562.4 | 4764 | 0.1 |
| GIPPSLAND | 43184.6 | 43101.6 | 2951.3 | 38422.2 | 48279.0 | 24922 | 1.7 |
| GRAMPIANS | 1944.3 | 1908.9 | 309.9 | 1492.5 | 2505.6 | 9515 | 0.2 |
| HUME | 29192.7 | 29120.4 | 2022.7 | 25960.1 | 32635.4 | 16672 | 1.8 |
| LODDON MALLEE | 2191.6 | 1887.5 | 1226.0 | 976.1 | 4583.2 | 15218 | 0.1 |
| PORT PHILLIP | 4818.8 | 4801.4 | 359.4 | 4247.6 | 5408.8 | 2232 | 2.2 |
| TOTAL | 81597.7 | 81363.1 | 5654.7 | 72609.3 | 91171.1 | 73323 | 1.1 |
abundance_table(multispecies_model_zip, regions = reg, pred_area = table(multispecies_data$pred_reg), caption = "Regional estimates of Fallow Deer Density", species_index = 2)
| Region | Mean | Median | SD | 5 % | 90 % | Area km2 | Average Density (per km2) |
|---|---|---|---|---|---|---|---|
| BARWON SOUTH WEST | 15004.6 | 14930.2 | 1751.8 | 12194.6 | 17984.2 | 4764 | 3.1 |
| GIPPSLAND | 13232.6 | 13203.4 | 1122.6 | 11464.6 | 15180.6 | 24922 | 0.5 |
| GRAMPIANS | 7192.8 | 7024.4 | 1181.6 | 5575.8 | 9327.5 | 9515 | 0.8 |
| HUME | 19859.7 | 19840.0 | 1941.3 | 16794.3 | 23183.4 | 16672 | 1.2 |
| LODDON MALLEE | 4097.6 | 4064.9 | 545.6 | 3272.6 | 5036.6 | 15218 | 0.3 |
| PORT PHILLIP | 1897.1 | 1884.9 | 277.9 | 1467.1 | 2369.4 | 2232 | 0.8 |
| TOTAL | 61284.6 | 61208.7 | 4812.9 | 53695.3 | 69391.3 | 73323 | 0.8 |
abundance_table(multispecies_model_zip, regions = reg, pred_area = table(multispecies_data$pred_reg), caption = "Regional estimates of Red Deer Density", species_index = 3)
| Region | Mean | Median | SD | 5 % | 90 % | Area km2 | Average Density (per km2) |
|---|---|---|---|---|---|---|---|
| BARWON SOUTH WEST | 4619.3 | 4251.5 | 1897.7 | 2660.7 | 7503.1 | 4764 | 1.0 |
| GIPPSLAND | 947.8 | 899.1 | 377.2 | 439.5 | 1613.5 | 24922 | 0.0 |
| GRAMPIANS | 3724.5 | 3532.9 | 1174.2 | 2195.1 | 5894.3 | 9515 | 0.4 |
| HUME | 4365.4 | 4136.1 | 1559.5 | 2225.8 | 7233.3 | 16672 | 0.3 |
| LODDON MALLEE | 141.7 | 53.1 | 297.3 | 3.1 | 534.5 | 15218 | 0.0 |
| PORT PHILLIP | 596.5 | 548.1 | 269.9 | 268.7 | 1106.7 | 2232 | 0.3 |
| TOTAL | 14395.2 | 13867.4 | 3908.5 | 9494.1 | 21122.2 | 73323 | 0.2 |
Our model uses covariates to inform three seperate submodels:
At a site, there are two processes that may lead us to not record deer on a camera when in reality they occupy the site (whereby the site is the home range area surrounding the camera.) Firstly, deer may be available for detection and enter the sampling area in front of the camera. However, due to a function of their distance from the camera we fail to detect this individual. Secondly, they may occupy a site (known from transect signs) but never enter the camera field of view, in this case we estimate the various detection rates from the various methods. The detection rate of the camera in this method is regarded as the spatial availability of deer for camera trap sampling.
Below we show the average detection rate for a given group of deer in front of the camera (up to 12.5m). Detection rastes appear to be higher for Sambar than other species.
det_rates <- multispecies_model_zip$summary("p") %>%
mutate(var = stringr::str_extract(variable, "(?<=\\[).+?(?=\\])")) %>%
separate(var, into = c("site", "Group Size"), sep = ",")
av_det_rates <- det_rates %>%
group_by(`Group Size`) %>%
summarise(mean = mean(mean)) %>%
ungroup() %>%
transmute(`Group Size`,
`Average Site Detection Probability` = mean)
av_det_rates %>%
bind_rows() %>%
arrange(`Group Size`) %>%
kbl("html", digits = 3, caption = "Average detection rates for different group sizes") %>%
kable_styling("striped") %>%
collapse_rows(1)
| Group Size | Average Site Detection Probability |
|---|---|
| 1 | 0.428 |
| 2 | 0.558 |
| 3 | 0.638 |
| 4 | 0.693 |
| 5 | 0.733 |
For a group size of 1 we can generate detection function plots that show how detection rates fall-off over varying distances.
Note the following data suggests we should test hazard functions for Fallow, Red and Hog.
site_vars <- dplyr::tbl(con, dbplyr::in_schema("deervic", "curated_site_data")) %>%
dplyr::filter(SiteID %in% !!c(cams_curated$SiteID, "47191")) %>%
dplyr::collect() %>%
dplyr::mutate(HerbaceousUnderstoryCover = NNWHUCover + ENWHUCover,
SiteID = dplyr::case_when(SiteID == "47191" & CameraID == "HO04101053" ~ "47191A",
SiteID == "47191" & CameraID != "HO04101053" ~ "47191B",
TRUE ~ SiteID)) %>% # native + exotic herbaceous cover
dplyr::arrange(SiteID) %>%
as.data.frame()
n_distance_bins <- 5
delta <- 2.5
midpts <- c(1.25, 3.75, 6.25, 8.75, 11.25)
max_distance <- 12.5
det_curve <- multispecies_model_zip$draws("DetCurve", format = "draws_matrix") %>%
as.data.frame() %>%
head(250) %>%
pivot_longer(cols = everything())
det_curve_wr <- det_curve %>%
mutate(var = stringr::str_extract(name, "(?<=\\[).+?(?=\\])")) %>%
separate(var, into = c("s", "Distance"), sep = ",")
det_vars_pred <- site_vars %>%
mutate(s = as.character(1:nrow(.)),
herbaceouslvl = cut(HerbaceousUnderstoryCover,
breaks = c(0, 25, 50, 75, 105),
labels = c("0 - 25%",
"25 - 50%",
"50 - 75%",
"75 - 100%"),
include.lowest = T, right = FALSE))
det_curve_sum <- det_curve_wr %>%
mutate(Distance = as.numeric(Distance)-1) %>%
left_join(det_vars_pred) %>%
group_by(herbaceouslvl, Distance) %>%
summarise(median = quantile(value, 0.5),
q5 = quantile(value, 0.05),
q95 = quantile(value, 0.95))
y_combined <- colSums(rowSums(multispecies_data$y, dims = 2)) %>% # just for group size 1
as.data.frame() %>%
`colnames<-`("Count") %>%
mutate(Distance = midpts,
Prop = Count/(max(Count)),
CountS = Count/(2 * 2.5 * Distance)/max_distance^2,
PropS = CountS/(max(CountS)))
detection_plot_HN <- ggplot(aes(x = Distance), data = det_curve_sum) +
geom_col(aes(y = PropS), fill = "grey70", colour = "grey20", width = 2.5, data = y_combined, alpha = 0.7) +
# geom_errorbar(aes(ymin = PropSq5, ymax = PropSq95), data = y_combined) +
# geom_line(aes(y = HNS)) +
geom_ribbon(aes(ymin = q5, ymax = q95, fill = herbaceouslvl), alpha = 0.5) +
geom_line(aes(y = median, colour = herbaceouslvl)) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0), limits = c(0,1)) +
scale_fill_brewer(palette = "Set1") +
scale_colour_brewer(palette = "Set1") +
ylab("Detection probability (p)") +
labs(fill = "Herbaceous\nUnderstorey\nCover", colour = "Herbaceous\nUnderstorey\nCover") +
theme_classic()
detection_plot_HN
For Sambar, Fallow and Red Deer we were able to estimate detection rates of the various methods used to detect deer (camera, footprints, pellets, rubbings and wallows).
all_draws <- multispecies_model_zip$draws("beta_trans_det", format = "df")
# det_marginal_effects <- list()
# det_plot <- list()
obs_vars <- unlist(stringr::str_remove_all(string = labels(terms(transect_formula)), pattern = "log|scale|\\)|\\("))
obs_cols <- unlist(stringr::str_remove_all(string = colnames(multispecies_data$trans_pred_matrix), pattern = "log|scale|\\)|\\("))
obs_logs <- unlist(stringr::str_detect(string = colnames(multispecies_data$trans_pred_matrix), pattern = "log\\("))
obs_labs <- c("Survey Method")
fac <- c(TRUE, TRUE, TRUE, TRUE, TRUE)
fac2 <- c(TRUE)
det_obs <- multispecies_data$transects %>%
mutate(Survey = factor(Survey))
det_marginal_effects <- list()
det_plot <- list()
params_w_levs <- levels(det_obs$Survey)
for(i in 1:(length(obs_cols))) {
det_marginal_effects[[i]] <- marginal_effects_cmd(all_draws,
param = "beta_trans_det",
param_number = i, log = obs_logs[i],
model_data = det_obs,
model_column = obs_vars[c(1,attr(multispecies_data$trans_pred_matrix, "assign")[-1])[i]],
transition = FALSE) %>%
mutate(group = param,
param = params_w_levs[i],
factor = fac[i],
variable = as.numeric(variable))
if(fac[i]) {
det_marginal_effects[[i]] <- det_marginal_effects[[i]]
}
}
marginal_prob <- function(x, pwr = 3) {
xm <- 1-x
return(1-(xm^pwr))
}
det_marginal_effects_bind <- bind_rows(det_marginal_effects) %>%
rowwise() %>%
mutate(value = case_when(param != "Camera" ~ marginal_prob(value),
TRUE ~ value))
det_marginal_effects_split <- split(det_marginal_effects_bind, det_marginal_effects_bind$group)
for(i in 1:length(det_marginal_effects_split)) {
if(fac2[i]) {
plot_data <- det_marginal_effects_split[[i]] %>%
mutate(variable = param,
param = group)
} else {
plot_data <- det_marginal_effects_split[[i]]
}
det_plot[[i]] <- plot_data
}
data_for_plot <- bind_rows(det_plot) %>%
mutate(species = "All (combined)")
marginal_effects_plot_cmd_all(data_for_plot,
factor = TRUE,
ylab = "Detection probability") +
xlab("Survey method") +
scale_y_continuous(limits = c(0,1)) +
theme(legend.position = "top") +
scale_fill_brewer(palette = "Set1") +
# scale_x_discrete(expand = c(0, 0)) +
theme_classic()
Figure 6: Detection Probability for the various methods of survey. Camera trap detection probability is based on average deployment length, while transects are based on 3 independent transect searches
We used a spatially-derived random-effect of the bioregion of the sampled site. This random-effect allows us the ability to make predictions that include the variance associated with this random effect. This random-effect also minimises predictions in areas without deer detections (e.g. the mallee).
bioregion_contribution <- function(model, data, species, species_index) {
eps_bioregion <- model$draws("eps_bioregion_zip", format = "matrix")
which_sp <- which(stringr::str_detect(string = colnames(eps_bioregion),
pattern = paste0("eps_bioregion_zip\\[", i)))
eps_bioregion <- eps_bioregion[,which_sp]
bioregion_data <- data[["bioreg_sf"]]
colnames(eps_bioregion) <- bioregion_data$bioregion
plot <- mcmc_areas(eps_bioregion, prob_outer = 0.9) +
ggtitle(species)
return(plot)
}
bio_plot_data <- list()
for(i in 1:length(species_names)) {
bio_plot_data[[i]] <- bioregion_contribution(model = multispecies_model_zip,
data = multispecies_data,
species = species_names[i], species_index = i)
}
cowplot::plot_grid(plotlist = bio_plot_data, ncol = 1)
Figure 7: Influence of the bioregion random effect on abundance (log-scale).
ab_joined_list <- list()
for(j in 1:length(species_names)) {
beta_draws <- multispecies_model_zip$draws("beta_psi", format = "df")
# which_sp <- which(stringr::str_detect(string = colnames(beta_draws),
# pattern = paste0("beta_psi\\[", i)))
#
# beta_draws <- beta_draws[,which_sp]
ab_marginal_effects <- list()
ab_plot <- list()
ab_to_use <- ab_formula_4
phi_vars <- unlist(stringr::str_remove_all(string = labels(terms(ab_to_use)), pattern = "log|scale|sqrt|\\)|\\("))
phi_logs <- unlist(stringr::str_detect(string = labels(terms(ab_to_use)), pattern = "log\\("))
phi_sqrts <- c(0.5,0.5,1,1,0.5)
phi_labs <- c("Bare Soil (%)",
"Nitrogen (%)",
"Distance to Pastural Land (m)",
"Precipitation Seasonality",
'Length of forest edge in 1km2 (m)')
fac <- c(FALSE, FALSE, FALSE, FALSE, FALSE)
for(i in 1:length(phi_vars)) {
ab_marginal_effects[[i]] <- marginal_effects_cmd(beta_draws,
param = "beta_psi", species_index = j,
param_number = i+1, log = phi_logs[i],
model_data = multispecies_data[["raw_data"]],
abundance = TRUE,
pwr = phi_sqrts[i],
model_column = phi_vars[i],
transition = FALSE) %>%
mutate(species = species_names[j])
}
ab_joined_list[[j]] <- bind_rows(ab_marginal_effects)
}
all_me_data <- bind_rows(ab_joined_list) %>%
mutate(param = case_when(param == "BareSoil" ~ "Bare Soil (%)",
param == "Nitrogen" ~ "Nitrogen (%)",
param == "PastureDistance" ~ "Distance to Pastural Land (m)",
param == "BIO15" ~ "Precipitation Seasonality",
param == "ForestEdge" ~ 'Amount of Forest Edge per km2 (m)'))
marginal_effects_plot_cmd_all(all_me_data,
col = "DarkGreen",
factor = FALSE,
ylab = "Contribution to Abundance (log-scale)") +
ggplot2::facet_grid(species~param, scales = "free") +
theme_bw() +
xlab("Covariate Value")
Figure 8: Marginal response curves for the various fixed-effect parameters used in the model.
density <- multispecies_zip_1$draws("N_site", format = "matrix")
site_estimates <- list()
site_density <- list()
for(i in 1:n_site) {
site_estimates[[i]] <- character()
for(j in 1:length(deer_species_all)) {
site_estimates[[i]][j] <- paste0("N_site[", j, ",", i, "]")
}
site_density[[i]] <- sum(colMeans(density)[site_estimates[[i]]])
}
site_density_ul <- unlist(site_density)
site_density_df <- cams_curated %>%
mutate(AllDeerDensity = site_density_ul,
Bioregion = factor(multispecies_data$site_bioreg)) %>%
filter(ProjectShortName == "StatewideDeer")
site_vars <- dplyr::tbl(con, dbplyr::in_schema("deervic", "curated_site_data")) %>%
dplyr::filter(SiteID %in% !!c(cams_curated$SiteID, "47191")) %>%
dplyr::collect() %>%
dplyr::mutate(HerbaceousUnderstoryCover = NNWHUCover + ENWHUCover,
SiteID = dplyr::case_when(SiteID == "47191" & CameraID == "HO04101053" ~ "47191A",
SiteID == "47191" & CameraID != "HO04101053" ~ "47191B",
TRUE ~ SiteID)) %>% # native + exotic herbaceous cover
dplyr::arrange(SiteID) %>%
as.data.frame()
site_density_df_joined <- site_density_df %>%
left_join(site_vars)
library(brms)
bform1 <- bf(mvbind(CanopyCov, Saplings, Seedlings, BGroundCover) ~ AllDeerDensity + (1|p|fosternest) + (1|q|dam)) +
set_rescor(TRUE)
fit1 <- brm(bform1, data = BTdata, chains = 2, cores = 2)